── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ purrr::compose() masks pryr::compose()
✖ lubridate::duration() masks arrow::duration()
✖ tidyr::extract() masks R.utils::extract()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ purrr::partial() masks pryr::partial()
✖ dplyr::where() masks pryr::where()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Display your machine memory.
memuse::Sys.meminfo()
Totalram: 32.000 GiB
Freeram: 12.311 GiB
In this exercise, we use tidyverse (ggplot2, dplyr, etc) to explore the MIMIC-IV data introduced in homework 1 and to build a cohort of ICU stays.
Question 1:
Q1. Visualizing patient trajectory Visualizing a patient’s encounters in a health care system is a common task in clinical data analysis. In this question, we will visualize a patient’s ADT (admission-discharge-transfer) history and ICU vitals in the MIMIC-IV data.
Question 1.1: Graph Duplication
Reading in the files for Q1
patients <- arrow::open_dataset("~/mimic/hosp/patients.csv.gz", format ="csv")admissions <- arrow::open_dataset("~/mimic/hosp/admissions.csv.gz", format ="csv")transfers <- arrow::open_dataset("~/mimic/hosp/transfers.csv.gz", format ="csv")procedures_icd <- arrow::open_dataset("~/mimic/hosp/procedures_icd.csv.gz", format ="csv")diagnoses_icd <- arrow::open_dataset("~/mimic/hosp/diagnoses_icd.csv.gz", format ="csv")d_icd_procedures <- arrow::open_dataset("~/mimic/hosp/d_icd_procedures.csv.gz", format ="csv")d_icd_diagnoses <- arrow::open_dataset("~/mimic/hosp/d_icd_diagnoses.csv.gz", format ="csv")
First, I am going to designate the patient ID so that the TA can change it later and it can be easily filtered. Afterwards, I will then use the subject_id to filter the data sets
# A tibble: 10 × 3
icd_code icd_version long_title
<chr> <int> <chr>
1 0001 9 Therapeutic ultrasound of vessels of head and neck
2 0002 9 Therapeutic ultrasound of heart
3 0003 9 Therapeutic ultrasound of peripheral vascular vessels
4 0009 9 Other therapeutic ultrasound
5 001 10 Central Nervous System and Cranial Nerves, Bypass
6 0010 9 Implantation of chemotherapeutic agent
7 0011 9 Infusion of drotrecogin alfa (activated)
8 0012 9 Administration of inhaled nitric oxide
9 0013 9 Injection or infusion of nesiritide
10 0014 9 Injection or infusion of oxazolidinone class of antibio…
Looking at this, this data is filtered for the adequate columns we need. However, we already created a parquet of the data as well. Let us see if we can look ten lines into the parquet to see which one we should use
file.info("part-0.parquet")$size
[1] 152917918
This is 152 MB, so this is the parquet used in labevents.filtered folder in the hosp filter within the mimic folder
The top diagnoses were intestinal adhesions with obstruction, acurate respiratory failure with hypoxia, and von willebrand disease for the patient of interest. We have to make sure we count each of the different long_titles and ensure they are able to put into the ggplot and are correct. I received an error regarding an as_vector, so let us set this to true to prevent this happening as well.
[1] "Fistula of intestine"
[2] "Other secondary pulmonary hypertension"
[3] "Unspecified Escherichia coli [E. coli] as the cause of diseases classified elsewhere"
this code now displays the three most common diagnoses, which are Von Willebrand Disease, other secondary pulmonary hypertension and E. Coli
Let us now left join the procedures based on the ICD_Code and ICD_version
When we do the regular ggplot, the legend has text that is way too long. Let us wrap this
Now let us make each procedure a unique factor so that it can be recognized in ggplot
library(stringr)# Apply str_wrap() to wrap text for better legend displayLJProcedures$long_title_wrapped <-str_wrap(LJProcedures$long_title, width =17)# Check resultprint(LJProcedures)
subject_id hadm_id seq_num chartdate icd_code icd_version
1 10063848 21345067 1 2177-07-25 0DB80ZZ 10
2 10063848 21345067 2 2177-07-25 0DN80ZZ 10
3 10063848 21345067 3 2177-08-03 4A023N6 10
4 10063848 21345067 4 2177-07-28 02HV33Z 10
5 10063848 24092966 1 2177-08-29 0W9G30Z 10
6 10063848 24092966 2 2177-09-04 0W9G30Z 10
long_title
1 Excision of Small Intestine, Open Approach
2 Release Small Intestine, Open Approach
3 Measurement of Cardiac Sampling and Pressure, Right Heart, Percutaneous Approach
4 Insertion of Infusion Device into Superior Vena Cava, Percutaneous Approach
5 Drainage of Peritoneal Cavity with Drainage Device, Percutaneous Approach
6 Drainage of Peritoneal Cavity with Drainage Device, Percutaneous Approach
long_title_wrapped
1 Excision of Small\nIntestine, Open\nApproach
2 Release Small\nIntestine, Open\nApproach
3 Measurement of\nCardiac Sampling\nand Pressure,\nRight Heart,\nPercutaneous\nApproach
4 Insertion of\nInfusion Device\ninto Superior\nVena Cava,\nPercutaneous\nApproach
5 Drainage of\nPeritoneal\nCavity with\nDrainage Device,\nPercutaneous\nApproach
6 Drainage of\nPeritoneal\nCavity with\nDrainage Device,\nPercutaneous\nApproach
Now let us make each procedure a unique factor so that it can be recognized in ggplot. By setting the names we are making sure that each of them are made into different shapes and colors
To make it universal, we will have to figure out what the minimum and maximum charttimes are for both the stay_ids. We do the as.POSIXct as that is the error I got that I needed to fix
We are making it so that if the time falls between min and max 1, it is assigned the first stay_id. If it falls between min 2 and max 2, then it is assigned the second
With all the data in one table, we can not make the ggplot
ggplot(chartevents.filtered.arrow, aes(x = charttime, y = valuenum, color =factor(itemid))) +geom_line(size =0.8) +geom_point(size =1.5) +facet_grid(itemid ~ stay_id, scales ="free", space ="fixed", labeller =labeller(itemid =c("220045"="HR","220181"="NBPd","220179"="NBPs","220210"="RR","223761"="Temperature" ))) +labs(title =paste("Patient", unique(chartevents.filtered.arrow$subject_id), "ICU stays - Vitals"),x ="",y ="",color ="Vital Type" ) +scale_x_datetime(date_labels ="%b %d %H:%M") +theme_minimal() +theme(strip.text.x =element_text(size =14, face ="bold", color ="white"),strip.text.y =element_text(size =12, face ="bold", color ="white"),strip.background =element_rect(fill ="grey80"),panel.grid.major =element_line(color ="gray80"),panel.grid.minor =element_blank(),axis.text.x =element_text(angle =45, hjust =1, size =10),axis.text.y =element_text(size =10),legend.position ="none",plot.margin =margin(10, 10, 10, 10),panel.border =element_rect(color ="grey50", fill =NA, linewidth =1.5) )
Q2. ICU stays
icustays.csv.gz (https://mimic.mit.edu/docs/iv/modules/icu/icustays/) contains data about Intensive Care Units (ICU) stays. The first 10 lines are
zcat< ~/mimic/icu/icustays.csv.gz |head
subject_id,hadm_id,stay_id,first_careunit,last_careunit,intime,outtime,los
10000032,29079034,39553978,Medical Intensive Care Unit (MICU),Medical Intensive Care Unit (MICU),2180-07-23 14:00:00,2180-07-23 23:50:47,0.4102662037037037
10000690,25860671,37081114,Medical Intensive Care Unit (MICU),Medical Intensive Care Unit (MICU),2150-11-02 19:37:00,2150-11-06 17:03:17,3.8932523148148146
10000980,26913865,39765666,Medical Intensive Care Unit (MICU),Medical Intensive Care Unit (MICU),2189-06-27 08:42:00,2189-06-27 20:38:27,0.4975347222222222
10001217,24597018,37067082,Surgical Intensive Care Unit (SICU),Surgical Intensive Care Unit (SICU),2157-11-20 19:18:02,2157-11-21 22:08:00,1.1180324074074075
10001217,27703517,34592300,Surgical Intensive Care Unit (SICU),Surgical Intensive Care Unit (SICU),2157-12-19 15:42:24,2157-12-20 14:27:41,0.948113425925926
10001725,25563031,31205490,Medical/Surgical Intensive Care Unit (MICU/SICU),Medical/Surgical Intensive Care Unit (MICU/SICU),2110-04-11 15:52:22,2110-04-12 23:59:56,1.338587962962963
10001843,26133978,39698942,Medical/Surgical Intensive Care Unit (MICU/SICU),Medical/Surgical Intensive Care Unit (MICU/SICU),2134-12-05 18:50:03,2134-12-06 14:38:26,0.8252662037037037
10001884,26184834,37510196,Medical Intensive Care Unit (MICU),Medical Intensive Care Unit (MICU),2131-01-11 04:20:05,2131-01-20 08:27:30,9.17181712962963
10002013,23581541,39060235,Cardiac Vascular Intensive Care Unit (CVICU),Cardiac Vascular Intensive Care Unit (CVICU),2160-05-18 10:00:53,2160-05-19 17:33:33,1.314351851851852
Q2.1 Ingestion
Import icustays.csv.gz as a tibble icustays_tble.
icustays_arrow <- arrow::open_dataset("~/mimic/icu/icustays.csv", format ="csv")icustays_tble <- icustays_arrow %>%collect() %>%# Pulls data into memoryas_tibble()glimpse(icustays_tble)
The answer is 41, so some subjects DO have multiple stays in the intensive care unit. The most being one subject having 41 stays
We can make a bar chart to illustrate the amount of icu stays per subject. I chose to do a log-scale because the bart chart makes the data hard to see since it is highly skewed
ggplot(icustayscount, aes(x = n)) +geom_bar(fill ="pink") +scale_y_log10() +labs(title ="Distribution of ICU Stays per Subject",x ="Number of ICU Stays",y ="Number of Patients (Log Scale)" ) +theme_minimal()
This shows that the main frequency of subjects only have one stay in the intensive care unit
Q3 Admissions Data
Information of the patients admitted into hospital is available in admissions.csv.gz. See https://mimic.mit.edu/docs/iv/modules/hosp/admissions/ for details of each field in this file. The first 10 lines are
Summarize the following information by graphics and explain any patterns you see.
number of admissions per patient admission hour (anything unusual?) admission minute (anything unusual?) length of hospital stay (from admission to discharge) (anything unusual?)
let us start by making a graph of the number of admissions per patient
Before we make the graph, we need to summarize the data for the subject_ids and group them
ggplot(admissionscount, aes(x = n)) +geom_bar(fill ="pink", color ="black") +labs(title ="Distribution of Admissions per Subject",x ="Number of Admissions",y ="Number of Patients" ) +coord_cartesian(xlim =c(1, 25)) +theme_minimal()
I do not see any patterns that occur. If anything, the only pattern here is that many of the subjects only are admitted once, while some of admitted more than once, however, this becomes increasingly rare. The reason why this may occur is because the subjects are coming in for more curable/fixable issues and are leaving with a solution. Unless the patient has an ongoing disease that requires consistent maintenence into the hospital, this would be why most of them are admitted once.
Now let us make a graph summarizing the admission hour (anything unusual?)
ggplot(admissions_tble, aes(x =hour(admittime))) +geom_bar(fill ="pink", color ="black") +labs(title ="Distribution of Admission Hour",x ="Hour",y ="Number of Admissions" ) +theme_minimal()
This graph looks at specifically the hour at which the admissions occur. What is interesting about this graph is that most of the admissions occur at the 12th hour, which is noon. This is interesting as most people would think that admissions would occur in the morning, but this is not the case. On top of this, the admissions in the 23rd and 0th hour are pretty high compared to their neighboring hours. This would make sense since some illnesses, like asthma, a disease that I suffer from, are more likely to get worse at night. Individuals may also try and manage their symptoms throughout the day, until they realize they actually cannot, which they would make this decision before going to bed (11 pm to 12 am)
Now let us make a graph summarizing the admission minute (anything unusual?)
ggplot(admissions_tble, aes(x =minute(admittime))) +geom_bar(fill ="pink", color ="black") +labs(title ="Distribution of Admission Minute",x ="Minute",y ="Number of Admissions" ) +theme_minimal()
What is unusual about this is that most of the admissions occurred at each quarter of an hour (0 minutes, 15 minutes, 30 minutes, and 45 minutes). On top of this minute 60 has no admissions in them, which would make sense as when minute sixty hits, it is minute zero of the new hour. The reason why this occurs is because nurses or doctors may be rounding their admission minute to the nearest quarter hour to make it easier to keep track of the patients and because of convenience
Now let us make a graph summarizing the length of hospital stay (from admission to discharge) (anything unusual?)
Making sure the admittime can be read in correct format
admissions_tble <- admissions_tble %>%mutate(admittime =as.POSIXct(admittime, format ="%Y-%m-%d %H:%M:%S"),dischtime =as.POSIXct(dischtime, format ="%Y-%m-%d %H:%M:%S"))
Mutating the data information so that length_of_stay is in the units of days for both the admittime and the discharge time
admissions_tble <- admissions_tble %>%mutate(length_of_stay =difftime(dischtime, admittime, units ="days"))
Graphing the data now
ggplot(admissions_tble, aes(x = length_of_stay)) +geom_histogram(fill ="pink", color ="black", bins =200) +labs(title ="Distribution of Length of Stay",x ="Length of Stay (Days)",y ="Number of Admissions" ) +coord_cartesian(xlim =c(1, 100)) +theme_minimal()
Don't know how to automatically pick scale for object of type <difftime>.
Defaulting to continuous.
Looking at this chart, there really is nothing unusual that happens. I think the one thing here that stands out is the fact that individuals admitted usually spend more than one day rather than being discharged same day. However, this does make sense as many hospitals want to make sure that you are okay and healthy before you are discharged. In other words, staying over night for observation and to make sure there are no complications would explain why individuals stay for longer days rather than being discharged same day
Q4.1 Ingestion
Import patients.csv.gz (https://mimic.mit.edu/docs/iv/modules/hosp/patients/) as a tibble patients_tble.
ggplot(patients_tble, aes(x = gender)) +geom_bar(fill ="pink", color ="black") +labs(title ="Distribution of Patients by Gender",x ="Gender",y ="Number of Patients" ) +theme_minimal()
It looks like most of the patients that show up identify as female for their gender. This makes sense considering that women have to go more to the doctors to check up on reproductive health and yearly screenings for cancer depending on their age
Now we have to do it by age
ggplot(patients_tble, aes(x = anchor_age)) +geom_histogram(fill ="pink", color ="black", bins =75) +labs(title ="Distribution of Patients by Age",x ="Age",y ="Number of Patients" ) +coord_cartesian(xlim =c(0, 100))+theme_minimal()
This data is indicative of age-rounding, where either the patient or the practitioner rounds the age of the patient either up or down. There is also no data before about 20 years of age, and past 87.5 years of age. As we know, these ages or possible so these are not being reported. We also see that at one point there is data missing completely, which could be more indicative of age rounding as well or not taking into consideration age for the patient when examining them. The large amount of young adult admissions could be because of the onset of diseases that do not show up until young adulthood, like diabetes, MS, or schizophrenia. Individuals of this age are also still on their parent’s health insurance, so they may be more likely to go visit the hospital since it would not be on their dime. This then explains the dip in the 30s: individuals are on their own health insurance now and have to be more financially conscientious since their insurance is likely expensive. These adults are also more likely to be healthy compared to other age ranges. The last peak is quiet indicative of a cutoff or even an issue with data entry.
Q5 Lab results
labevents.csv.gz (https://mimic.mit.edu/docs/iv/modules/hosp/labevents/) contains all laboratory measurements for patients. The first 10 lines are
zcat< ~/mimic/hosp/labevents.csv.gz |head
labevent_id,subject_id,hadm_id,specimen_id,itemid,order_provider_id,charttime,storetime,value,valuenum,valueuom,ref_range_lower,ref_range_upper,flag,priority,comments
1,10000032,,2704548,50931,P69FQC,2180-03-23 11:51:00,2180-03-23 15:56:00,___,95,mg/dL,70,100,,ROUTINE,"IF FASTING, 70-100 NORMAL, >125 PROVISIONAL DIABETES."
2,10000032,,36092842,51071,P69FQC,2180-03-23 11:51:00,2180-03-23 16:00:00,NEG,,,,,,ROUTINE,
3,10000032,,36092842,51074,P69FQC,2180-03-23 11:51:00,2180-03-23 16:00:00,NEG,,,,,,ROUTINE,
4,10000032,,36092842,51075,P69FQC,2180-03-23 11:51:00,2180-03-23 16:00:00,NEG,,,,,,ROUTINE,"BENZODIAZEPINE IMMUNOASSAY SCREEN DOES NOT DETECT SOME DRUGS,;INCLUDING LORAZEPAM, CLONAZEPAM, AND FLUNITRAZEPAM."
5,10000032,,36092842,51079,P69FQC,2180-03-23 11:51:00,2180-03-23 16:00:00,NEG,,,,,,ROUTINE,
6,10000032,,36092842,51087,P69FQC,2180-03-23 11:51:00,,,,,,,,ROUTINE,RANDOM.
7,10000032,,36092842,51089,P69FQC,2180-03-23 11:51:00,2180-03-23 16:15:00,,,,,,,ROUTINE,PRESUMPTIVELY POSITIVE.
8,10000032,,36092842,51090,P69FQC,2180-03-23 11:51:00,2180-03-23 16:00:00,NEG,,,,,,ROUTINE,METHADONE ASSAY DETECTS ONLY METHADONE (NOT OTHER OPIATES/OPIOIDS).
9,10000032,,36092842,51092,P69FQC,2180-03-23 11:51:00,2180-03-23 16:00:00,NEG,,,,,,ROUTINE,"OPIATE IMMUNOASSAY SCREEN DOES NOT DETECT SYNTHETIC OPIOIDS;SUCH AS METHADONE, OXYCODONE, FENTANYL, BUPRENORPHINE, TRAMADOL,;NALOXONE, MEPERIDINE. SEE ONLINE LAB MANUAL FOR DETAILS."
d_labitems.csv.gz (https://mimic.mit.edu/docs/iv/modules/hosp/d_labitems/) is the dictionary of lab measurements.
zcat< ~/mimic/hosp/d_labitems.csv.gz |head
itemid,label,fluid,category
50801,Alveolar-arterial Gradient,Blood,Blood Gas
50802,Base Excess,Blood,Blood Gas
50803,"Calculated Bicarbonate, Whole Blood",Blood,Blood Gas
50804,Calculated Total CO2,Blood,Blood Gas
50805,Carboxyhemoglobin,Blood,Blood Gas
50806,"Chloride, Whole Blood",Blood,Blood Gas
50808,Free Calcium,Blood,Blood Gas
50809,Glucose,Blood,Blood Gas
50810,"Hematocrit, Calculated",Blood,Blood Gas
We are interested in the lab measurements of creatinine (50912), potassium (50971), sodium (50983), chloride (50902), bicarbonate (50882), hematocrit (51221), white blood cell count (51301), and glucose (50931). Retrieve a subset of labevents.csv.gz that only containing these items for the patients in icustays_tble. Further restrict to the last available measurement (by storetime) before the ICU stay. The final labevents_tble should have one row per ICU stay and columns for each lab measurement.
First, let us create the labevents file
labevents <- arrow::open_dataset("~/mimic/hosp/labevents.csv", format ="csv")
Now let us make it a parquet so we can make the directory after
arrow::write_dataset(labevents, path ="~/mimic/hosp/labevents_pq", format ="parquet")
The symbolic link was already made, but let us make sure it is still there
We now have to load the data into R and the do an inner_join. Note that an inner_join was used because a left_join took about 30 minutes to do
labevents_pq.filtered.icu <- labevents_pq.filtered %>%collect() %>%left_join(icustays_tble, by =c("subject_id"))
Warning in left_join(., icustays_tble, by = c("subject_id")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 5786 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Let us view the table to see if we are all set so far
We need to filter it for the charttime less than the intime and then group it by the three important variables: subject_id, stay_id, and itemid. We then have to slice it per the instructions of Dr. Zhou in the slack and from what we saw in class, and then ungroup it after
Warning: Values from `valuenum` are not uniquely identified; output will contain
list-cols.
• Use `values_fn = list` to suppress this warning.
• Use `values_fn = {summary_fun}` to summarise duplicates.
• Use the following dplyr code to identify duplicates.
{data} |>
dplyr::summarise(n = dplyr::n(), .by = c(subject_id, stay_id, itemid)) |>
dplyr::filter(n > 1L)
chartevents.csv.gz (https://mimic.mit.edu/docs/iv/modules/icu/chartevents/) contains all the charted data available for a patient. During their ICU stay, the primary repository of a patient’s information is their electronic chart. The itemid variable indicates a single measurement type in the database. The value variable is the value measured for itemid. The first 10 lines of chartevents.csv.gz are
Looking at the first few lines of the chartevents.csv.gz
We are interested in the vitals for ICU patients: heart rate (220045), systolic non-invasive blood pressure (220179), diastolic non-invasive blood pressure (220180), body temperature in Fahrenheit (223761), and respiratory rate (220210). Retrieve a subset of chartevents.csv.gz only containing these items for the patients in icustays_tble. Further restrict to the first vital measurement within the ICU stay. The final chartevents_tble should have one row per ICU stay and columns for each vital measurement.
Let us create the chartevents file, write it as a parquet, and then create a symbolic link to it
chartevents6 <- arrow::open_dataset("~/mimic/icu/chartevents.csv", format ="csv")
arrow::write_dataset(chartevents6, path ="~/mimic/hosp/chartevents_pq", format ="parquet")
Now I have to make a symbolic link to the chartevents_pq by running the following commands in the terminal
Everything is read in. We now have to make the table
icu_adults <- icustays_tble %>%inner_join(admissions_tble, by =c("subject_id", "hadm_id")) %>%inner_join(patients_tble, by ="subject_id") %>%filter(anchor_age >=18) # Use anchor_age instead of calculating from dobnames(icu_adults)
ggplot(mimic_icu_cohort, aes(x =fct_infreq(race), y = los)) +geom_col(fill ="pink") +labs(title ="Length of ICU Stay vs Race", x ="Race", y ="LOS (days)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
Warning: Removed 14 rows containing missing values or values outside the scale range
(`geom_col()`).
Although this is a nice graph, I would much rather have it be average length of stay
ggplot(mimic_icu_summary, aes(x =fct_infreq(race), y = mean_los)) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Race", x ="Race", y ="Average LOS (days)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
LOS vs. Insurance
mimic_icu_summary <- mimic_icu_cohort %>%group_by(insurance) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x =fct_infreq(insurance), y = mean_los)) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Insurance Type", x ="Insurance", y ="Average LOS (days)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
When I do unique(mimic_icu_cohort$insurance), there is an insurance that is “” or in other words, blank
LOS vs. Marital Status
mimic_icu_summary <- mimic_icu_cohort %>%group_by(marital_status) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x =fct_infreq(marital_status), y = mean_los)) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Marital Status", x ="Marital Status", y ="Average LOS (days)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
LOS vs. Gender
mimic_icu_summary <- mimic_icu_cohort %>%group_by(gender) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = gender, y = mean_los)) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Gender", x ="Gender", y ="Average LOS (days)") +theme_minimal()
LOS vs. Anchor_age
mimic_icu_summary <- mimic_icu_cohort %>%group_by(anchor_age) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = anchor_age, y = mean_los)) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Age", x ="Age at Admission", y ="Average LOS (days)") +theme_minimal()
LOS vs. Last Available Lab Measurements Before ICU Stay
LOS vs. Lab Creatinine
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Creatinine) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Creatinine, y = mean_los)) +geom_col(color ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Creatinine", x ="Creatinine (mg/dL)", y ="Average LOS (days)") +theme_minimal() +coord_cartesian(xlim =c(0, 75))
Warning: Duplicated aesthetics after name standardisation: colour
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Lab Potassium
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Potassium) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Potassium, y = mean_los)) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Potassium", x ="Potassium (mg/dL)", y ="Average LOS (days)") +theme_minimal()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Lab Sodium
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Sodium) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Sodium, y = mean_los )) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Sodium", x ="Sodium (mg/dL)", y ="Average LOS (days)") +theme_minimal()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Lab Chloride
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Chloride) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Chloride, y = mean_los )) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs Chloride", x ="Chloride (mg/dL)", y ="Average LOS (days)") +theme_minimal()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Lab Bicarbonate
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Bicarbonate) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Bicarbonate, y = mean_los )) +geom_col(color ="black") +labs(title ="Average Length of ICU Stay vs Bicarbonate", x ="Bicarbonate (mg/dL)", y ="Average LOS (days)") +theme_minimal()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Lab Hematocrit
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Hematocrit) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Hematocrit, y = mean_los )) +geom_col(color ="pink") +labs(title ="Average Length of ICU Stay vs Hematocrit", x ="Hematrocrit (mg/dL)", y ="Average LOS (days)") +theme_minimal()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Lab Glucose
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Glucose) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Glucose, y = mean_los )) +geom_col(color ="pink") +labs(title ="Average Length of ICU Stay vs Glucose", x ="Glucose (mg/dL)", y ="Average LOS (days)") +theme_minimal()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Lab WBC
mimic_icu_summary <- mimic_icu_cohort %>%group_by(WBC) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = WBC, y = mean_los )) +geom_col(color ="pink") +labs(title ="Average Length of ICU Stay vs White Blood Cell", x ="WBC (mg/dL)", y ="Average LOS (days)") +theme_minimal()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Vital Heart Rate
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Heart_Rate) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Heart_Rate, y = mean_los )) +geom_col(color ="pink") +labs(title ="Average Length of ICU Stay vs Heart_Rate", x ="Heart_Rate", y ="Average LOS (days)") +theme_minimal() +coord_cartesian(xlim =c(0, 250) )
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Vital SysBP
mimic_icu_summary <- mimic_icu_cohort %>%group_by(SysBP) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = SysBP, y = mean_los )) +geom_col(color ="pink") +labs(title ="Average Length of ICU Stay vs SysBP", x ="SysBP", y ="Average LOS (days)") +theme_minimal() +coord_cartesian(xlim =c(0, 300) )
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Vital DiaBP
mimic_icu_summary <- mimic_icu_cohort %>%group_by(DiaBP) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = DiaBP, y = mean_los )) +geom_col(color ="pink") +labs(title ="Average Length of ICU Stay vs DiaBP", x ="DiaBP", y ="Average LOS (days)") +theme_minimal() +coord_cartesian(xlim =c(0, 250) )
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Vital Temp
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Temp) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Temp, y = mean_los )) +geom_col(color ="pink") +labs(title ="Average Length of ICU Stay vs Temp", x ="Temp", y ="Average LOS (days)") +theme_minimal() +coord_cartesian(xlim =c(0, 150) )
Warning: `position_stack()` requires non-overlapping x intervals.
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. Vital Respiratory_Rate
mimic_icu_summary <- mimic_icu_cohort %>%group_by(Respiratory_Rate) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x = Respiratory_Rate, y = mean_los)) +geom_col(color ="pink") +labs(title ="Average Length of ICU Stay vs Respiratory_Rate", x ="Respiratory_Rate", y ="Average LOS (days)") +theme_minimal() +coord_cartesian(xlim =c(0, 200) )
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
LOS vs. First ICU_Unit
mimic_icu_summary <- mimic_icu_cohort %>%group_by(first_careunit) %>%summarise(mean_los =mean(los, na.rm =TRUE))ggplot(mimic_icu_summary, aes(x =fct_infreq(first_careunit), y = mean_los )) +geom_col(fill ="pink", color ="black") +labs(title ="Average Length of ICU Stay vs first_careunit", x ="first_careunit", y ="Average LOS (days)") +theme_minimal() +theme(axis.text.x =element_text(angle =75, hjust =1))